MCMC - Performance

Lecture 25

Dr. Colin Rundel

Stan

Stan in Python & R

At the moment both Python & R offer two variants of Stan:

  • pystan & RStan - native language interface to the underlying Stan C++ libraries

    • Former does not play nicely with Jupyter (or quarto or positron) - see here for fix
  • CmdStanPy & CmdStanR - are wrappers around the CmdStan command line interface

    • Interface is through files (e.g. model.stan)

Any of the above tools will require a modern C++ toolchain (requires C++17 support).

Stan process

Stan file basics

Stan code is divided up into specific blocks depending on usage - all of the following blocks are optional but the ordering has to match what is given below.

functions {
  // user-defined functions
}
data {
  // declares the required data for the model
}
transformed data {
   // allows the definition of constants and transforms of the data
}
parameters {
   // declares the model’s parameters
}
transformed parameters {
   // allows variables to be defined in terms of data and parameters
}
model {
   // defines the log probability function
}
generated quantities {
   // allows derived quantities based on parameters, data, and random number generation
}

A basic example

      Lec25/bernoulli.stan

data {
  int<lower=0> N;
  array[N] int<lower=0, upper=1> y;
}
parameters {
  real<lower=0, upper=1> theta;
}
model {
  theta ~ beta(1, 1); // uniform prior on interval 0,1
  y ~ bernoulli(theta);
}

      Lec25/bernoulli.json

{
    "N" : 10,
    "y" : [0,1,0,0,0,0,0,0,0,1]
}

Build & fit the model

from cmdstanpy import CmdStanModel
model = CmdStanModel(stan_file='Lec25/bernoulli.stan')
fit = model.sample(data='Lec25/bernoulli.json')
                                                                                                                                                                                                                                                                                                                                

Results

type(fit)
cmdstanpy.stanfit.mcmc.CmdStanMCMC
fit
CmdStanMCMC: model=bernoulli chains=4['method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
 csv_files:
    /var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/tmpcy2xjnn5/bernoulli2nghp52g/bernoulli-20250414213140_1.csv
    /var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/tmpcy2xjnn5/bernoulli2nghp52g/bernoulli-20250414213140_2.csv
    /var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/tmpcy2xjnn5/bernoulli2nghp52g/bernoulli-20250414213140_3.csv
    /var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/tmpcy2xjnn5/bernoulli2nghp52g/bernoulli-20250414213140_4.csv
 output_files:
    /var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/tmpcy2xjnn5/bernoulli2nghp52g/bernoulli-20250414213140_0-stdout.txt
    /var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/tmpcy2xjnn5/bernoulli2nghp52g/bernoulli-20250414213140_1-stdout.txt
    /var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/tmpcy2xjnn5/bernoulli2nghp52g/bernoulli-20250414213140_2-stdout.txt
    /var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/tmpcy2xjnn5/bernoulli2nghp52g/bernoulli-20250414213140_3-stdout.txt

Summary

fit.summary()
Mean MCSE StdDev MAD 5% 50% 95% ESS_bulk ESS_tail R_hat
lp__ -7.221240 0.015734 0.670917 0.294511 -8.595310 -6.964960 -6.74960 1920.35 2176.83 1.00222
theta 0.253394 0.003067 0.115330 0.115703 0.086205 0.239928 0.46514 1363.50 1735.65 1.00127

Trace plots

ax = az.plot_trace(fit, compact=False)
plt.show()

Diagnostics

print(fit.diagnose())
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.